Atomic-Resolution Structure of the Protein Encoded by Gene V of fd Bacteriophage in Complex with Viral ssDNA Determined by Magic-Angle Spinning Solid-State NMR

F-specific filamentous phages, elongated particles with circular single-stranded DNA encased in a symmetric protein capsid, undergo an intermediate step, where thousands of homodimers of a non-structural protein, gVp, bind to newly synthesized strands of DNA, preventing further DNA replication and preparing the circular genome in an elongated conformation for assembly of a new virion structure at the membrane. While the structure of the free homodimer is known, the ssDNA-bound conformation has yet to be determined. We report an atomic-resolution structure of the gVp monomer bound to ssDNA of fd phage in the nucleoprotein complex elucidated via magic-angle spinning solid-state NMR. The model presents significant conformational changes with respect to the free form. These modifications facilitate the binding mechanism and possibly promote cooperative binding in the assembly of the gVp–ssDNA complex.


■ INTRODUCTION
Bacteriophages, the most abundant entities on earth, are viruses that infect bacteria. Filamentous bacteriophages of the genus Inovirus are a family of viruses that predominantly infect Gram-negative bacteria. 1,2 Most of the current biological understanding of filamentous phages, including the phage mechanisms of infection, replication, and assembly, comes from extensive research conducted on a group of three closely related phages, known as F-specific filamentous phages (Ff phages). 3,4 Ff phages all infect Escherichia coli cells bearing Fpili organelles. The members of the Ff group�phages M13, fd, and f1�exhibit 98% DNA sequence identity, and therefore have been studied interchangeably. 2,3,5 Uniquely among other bacteriophages, infection by filamentous phages does not cause lysis of the host cell 6 and the host bacterial cells continue to grow and divide as new virion structures are produced within their cytoplasm, with about one-half to three-quarters the growth rate of uninfected cells. 7,8 First isolated from sewage systems in the early 1960s, Ff phages have since become prominent model systems, used widely in molecular biology research. 9−11 Ff phage particles have an elongated structure, approximately a micron in length and 6−7 nm in diameter. 12,13 The virion genome is structured as circular single-stranded DNA, encased in a symmetric protein capsid. Within the capsid, the ssDNA forms an anti-parallel two-stranded helix, structurally similar to double-stranded DNA, but it does not exhibit basepair complementarity. Most of the capsid is composed of approximately 2700 copies of a 50-residue-long, mostly helical single-coat protein, gVIIIp, known as the major coat protein. 6 Four minor coat proteins, with only a few copies of each, cap both ends of the virion (gIIIp and gVIp cap one end, while gVIIp and gIXp cap the other end). 6 The wild-type Ff phage genome comprises around 6400 nucleotides, and it contains nine genes, encoding for a total of 11 proteins. 4,14 The initial stage of the Ff phage life cycle is the infection of the Escherichia coli cell via attachment of gIIIp to the tip of the F-pilus, a helical surface filament that extends from the bacterial surface, containing stoichiometric amounts of proteins and phospholipids. 15,16 The pilus retracts toward the cell envelope, leading to the opening of the virion structure 17 and entry of the viral ssDNA into the cytoplasm of the host cell.
Next, the phage genome undergoes episomal replication within the host cell. The positive strand of the genome is used as a template to synthesize a negative strand, and together, both strands form a double-stranded phage genome, known as the replicative form (RF). 3 In the early stage after infection, RF is used for transcription of phage mRNA, followed by translation and expression of viral proteins. 18 RF undergoes the rolling circle replication mechanism, resulting in the generation of new positive strands of the phage genome, which are once again utilized for negative-strand synthesis. Once the expression level of a certain phage protein, gVp, has reached a critical threshold within the cytoplasm of the host, thousands of homodimers of gVp cooperatively bind to positive strands of phage ssDNA, coating nearly the entire two-stranded helix and forming a superhelical, left-handed nucleoprotein complex, 19 known as the premature virion. By disabling synthesis of the negative strand and subsequent generation of RF, further viral genome replication is prevented. 18 Formation of this nucleoprotein complex also packages the circular ssDNA in an elongated form, making the strand geometrically viable for assembly of the mature virion structure. 1 Once the intracellular complex reaches the cell membrane, a secretion-assembly process is initiated when an exposed hairpin loop at one end of the premature virion, serving as a packaging signal, interacts with the inner membrane and with phage proteins. 2,4 Next, in a process that has yet to be characterized in detail, the gVp dimers are stripped from the strand and replaced by the coat proteins, forming the newly assembled mature infectious virion, which is then extruded from the cell.
The binding of gVp dimers to ssDNA is highly cooperative 20 and is non-sequence-specific. 21 Depending on experimental conditions, several different binding modes have been described, with various stoichiometric ratios in the range of three to five nucleotides per monomer. 5 Under physiological in vitro conditions, the predominant stoichiometry reported is of four nucleotides per monomer. 18 Two high-quality models for the structure of free gVp were elucidated, one via X-ray crystallography ( Figure 1) for gVp of f1 phage 22 (PDB ID: 1VQB) and another via solution NMR 23,24 (PDB ID: 2GVB) for a Y41H mutant of M13 phage gVp, which was found to increase protein solubility. 23 Both models are nearly identical in secondary and tertiary structures (the backbone root-mean-square deviation (RMSD) of the monomer, excluding the DNA binding loop, is 1.5 Å; for the dimer, it is 1.9 Å 24 ) and agree with biochemical and biophysical data previously collected on the system. 18,24 They differ in the location of the N-terminus with respect to the core, but the main difference involves the orientation of the DNA binding loop with respect to the core. This results in a solution NMR monomer that is slightly more globular than the X-ray structure, and since the former does not seem to have enough vacant volume available in the putative binding cleft, it has previously been suggested that this loop would be required to assume a conformation more similar to that of the X-ray structure for ssDNA binding to take place. 24 The gVp monomer is 87 residues long. The predominant, stable form in solution, across a range of pH, temperature, protein concentrations, and salt concentrations, is that of a symmetric homodimer, with a total molecular mass of 19.4 kDa. 18 The homodimer structure in crystals of gVp ( Figure 1) is mostly stabilized by inter-monomer hydrophobic interactions. 25,26 The secondary structure of each monomer includes eight beta-strands and two 3 10 helices. Five of the beta strands form a right-handed twisted, anti-parallel, distorted beta sheet structure, known as a beta-barrel. 22 Interestingly, a similar folding motif termed OB (oligonucleotide/oligosaccharide binding) fold, based on a five-stranded beta-barrel, was identified in several oligonucleotide and oligosaccharide binding proteins. 27 The barrel forms the hydrophobic core of the monomer, from which four main loops protrude. 18 The first is the complex loop (residues 36− 43), which is thought to be involved in dimer−dimer interactions related to the assembly of the nucleoprotein complex. 25 The second is the dyad loop (residues 68−78). The dyad loop of one monomer is in spatial proximity to that of the other monomer; therefore, they both comprise most of the inter-monomer contact surface of the homodimer. 28 A third loop region, previously referred to as a "broad connecting loop" 28 (residues 49−59 connecting the fourth and fifth beta strands, both of which participate in the beta-barrel motif), was reported to be in the vicinity of the viral genome in the complex. 18 Here, it will be termed the "core loop". The fourth is the DNA-binding loop (residues 13−31), which was shown to be involved in the binding of the viral genome. Several findings point to the involvement of both the DNA-binding loop and the dyad loop in binding to ssDNA. First, the twofold symmetry axis relating both monomers gives rise to two concave clefts in the dimer structure, each formed by the DNA-binding loop of one monomer and the tip of the dyad loop of the other monomer. 28,29 These clefts can accommodate the two anti-parallel strands of ssDNA in the complex. Second, NMR studies have implicated residues in both loops as being in spatial proximity to bound nucleotides. 29 Third, both loops include several amide protons that have been shown to exhibit fast exchange with the solvent, which can indicate that the orientations of both loops with respect to the hydrophobic core are flexible. 18,26,30 Fourth, the calculated electrostatic potential at the surface of the putative binding domain, between the two loops, is highly positive, while the opposite side of the dimer, located at the outer surface of the nucleoprotein complex, exhibits nearly neutral surface charge. 31 Such an asymmetric charge distribution can facilitate binding to ssDNA as well as the formation of the symmetric nucleoprotein complex. The latter argument is further corroborated by NMR and fluorescence studies that have indicated that the binding of ssDNA by gVp dimers is mostly facilitated by electrostatic interactions between positively charged side-chain residues of the putative binding regions of the protein and the negatively charged sugar-phosphate ssDNA backbone. 18,32 The structure of the superhelical gVp−ssDNA nucleoprotein complex has yet to be determined in atomic resolution. No X- ray diffraction structure was reported, and several attempts have been made to computationally calculate a plausible model for the complex structure using global parameters collected empirically on the system. These models utilize the X-ray structure of the free form as a basis for the model, but as of yet, none of the suggested models were proven to be both accurate and in full agreement with available data regarding the complex. 25,31,33 Magic-angle spinning solid-state NMR (MAS ssNMR) is advantageous for probing systems of nucleotide-bound proteins, since linewidths are not dependent on the particle mass, long-range crystallinity is not required, as well as the ability to sensitively detect conformational changes, as even minor changes in the fold result in significant changes to the resonance positions. 34 In work previously reported by our lab, the isotropic chemical shifts of both free gVp and ssDNAbound gVp have been assigned. 35,36 Quantification and analysis of chemical shift perturbations (CSPs) led to the conclusion that gVp undergoes significant structural changes upon binding, and regions expected to undergo the most extensive structural modifications�including the DNA-binding loop, the core loop, and the C-terminus�were identified. 36 MAS ssNMR is also useful for protein structure elucidation based on acquisition of internuclear distance-dependent interactions and prediction of backbone torsion angles based on chemical shifts, 37 as well as acquisition of additional structural restraints. 38,39 Given a sufficient amount of restraints distributed along different regions of the sequence, a carefully designed calculation process can lead to a converged, viable structural model. Several protein structures have previously been solved via MAS ssNMR, and beyond crystalline monomers, more complex systems include the homodimeric Crh protein, 40 CAP-Gly bound to microtubules, 41 αsynuclein 42 and additional amyloid aggregates, 43 and the Anabaena sensory rhodopsin membrane protein in lipid bilayers. 44 In this research, we describe our calculated, atomicresolution model for the structure of the gVp monomer in complex with ssDNA of fd phage. We compare the structure to that of free gVp and discuss the biological significance of our findings, which provide insight into the binding process of gVp to Ff phage ssDNA.

■ RESULTS AND DISCUSSION
Generation of NMR Distance Restraints. Chemical shift assignments of the ssDNA-bound form of gVp were reported previously 36 and are available at the Biological Magnetic Resonance Bank (BMRB accession id: 51391). In order to elucidate information on internuclear distances from these assignments, we acquired several two-dimensional (2D) 13 C− 13 C dipolar-based MAS ssNMR experiments, where cross-peak correlations arise from pairs of NMR-active nuclei in spatial proximity. Depending on experimental parameters and conditions, 13 C pairs with internuclear distances up to approximately 8 Å can potentially give rise to spectrally detectable cross-peaks. 40,45,46 Dipolar-assisted rotational resonance (DARR 47 ) and combined R2 n v -driven (CORD 48 ) experiments were conducted at various mixing times (15− 300 ms, see SI Tables S1, S2, and S3). In addition, we conducted CHHC 49 experiments that entail the advantage of direct transfer of polarization between protons that are close in space in folded regions of the polypeptide and have stronger dipolar couplings than 13 C spin pairs. 50,51 The 2D experiments were conducted on both fully and sparsely 13 C-labeled samples of gVp in complex with unlabeled full-length ssDNA extracted directly from fd phage (fth1 strain, 8233 nucleotides 52 ). Sparsely labeled samples, where only a subset of carbon sites are isotopically labeled, give rise to simplified, less crowded spectra with reduced effects of dipolar truncation and relayed polarization transfer, allowing better detection of long-range contacts. 53 The resulting spectra also entail lower ambiguity levels on average, and therefore fewer assignments may be attributed to each cross-peak.
Cross-peak lists generated from eight spectra of the fully labeled sample, differing in experiment type, mixing time, and processing parameters, were concatenated into a single list. Similarly, 30 spectra of the sparsely labeled sample were combined. In order to generate a list of distance restraints, home-written Python scripts were provided with the two peak lists and the assigned chemical shifts of ssDNA-bound gVp as input. A chemical shift tolerance window of 0.3 ppm (a value typically chosen for ssNMR spectra 41,54 ) was used in order to attribute all possible assignments to each cross-peak, resulting in a set of both ambiguous distance restraints (ADRs 55 ) and unambiguous restraints (a single possible assignment). All restraints were set to a range of 2−8 Å. 40,56 For restraints arising from the sparsely labeled data sets, we utilized a probability threshold of 40% in order to discard possible assignments, which corresponded to pairs of carbon sites that had a low probability of both being isotopically labeled. That is, we only considered an assignment to a crosspeak as plausible if the product of the probabilities of both carbon sites to be 13 C-labeled in the sample was 40% or higher. Those statistics were derived using previously published statistical estimates for the effective probabilities of specific carbon sites in each type of amino acid to be 13 C-labeled in a sample prepared with [1,3-13 C]-glycerol as the sole carbon source. 53,57 Both lists were further simplified by discarding restraints that had more than 20 possible assignments, as this was shown, in the context of solution NMR, to have the potential to improve the quality of the calculated model. 58,59 An attempt to discard restraints that had more than five possible assignments resulted in failure to obtain a converged structure. An initial list of ADRs and non-ambiguous restraints was obtained by aggregating both restraint lists (fully and sparsely labeled) while giving precedence to restraints arising from the inherently less ambiguous, more informative sparse data, in spectral regions where the two peak-lists overlapped (see details regarding generation of distance restraints in the SI).
We applied several filters on the aggregated set of distance restraints in an attempt to discard restraints that do not arise from carbon−carbon correlations of structural significance, or those that are in significant violation of prior knowledge on the system. (1) We removed restraints that included a single-bond contact as one or more of the possible assignments, since such correlations are assumed to be much stronger than long-range contacts, and do not report on the overall fold of the protein.
(2) We assumed that any pair of carbon sites further apart than 16 Å in the X-ray structure of free gVp will not be close enough in the ssDNA-bound form of the protein to result in a spectrally detectable cross-peak. While evidence previously reported by our lab has demonstrated that the structure of the gVp protein undergoes significant changes upon binding, 36 we can safely assume that the complex assembly process does not entail a complete rearrangement of the overall fold. 33 (3) We removed contacts with a high probability of association with the homodimer interface. NMR structure calculations of multimers are challenging due to the inability to a priori distinguish between cross-peaks arising from intra-and intermonomer correlations. 60 When attempting to calculate the monomer structure, any inter-monomer correlation arising from the dimer interface that will be erroneously designated as an intra-monomer contact would introduce conformational errors and therefore distort the elucidated fold of the monomer model. We therefore ruled out (using a home-written Python script) all restraints that included possible assignments corresponding to an inter-monomer distance shorter than 7 Å in the X-ray structure of the free gVp homodimer under the assumption that the dimer interfaces of the free and ssDNAbound gVp dimer structures bear similarity. This assumption is further justified by findings previously reported by our lab, which demonstrated that the CSPs upon binding are relatively low in the dimer interface. 36 The resulting set of restraints, after the application of the three filters, was the initial set provided as input to the calculation process. A total of 1901 distance restraints were provided as input to the initial step of the structural calculations, including 251 unambiguous restraints.
Generation of Torsion Angle Restraints. The software tool TALOS+ was used in order to generate restraints on the ψ−ϕ torsion angles of the protein backbone. 37,61 A total of 142 torsion angle restraints were deemed consistent by the software and were used as input for the subsequent calculation process.
Structure Calculation. We used the Xplor-NIH 62 software package iteratively to calculate the protein structure. Both the filtered distance restraints and the predicted angle restraints were provided as input, along with the protein sequence.
During each iteration, a total of N structures (values used in the range of 100−250, see SI Table S5 for details) were generated. Afterward, these structures were sorted according to the energy score, and the k lowest energy structures (values  Table S5) were further analyzed in order to update the restraints to be provided as input to the subsequent iteration. After each iteration, we used two filtration methods in order to better inform the set of restraints to be provided as input to the next Xplor-NIH run. First, distance and angle restraints that were significantly violated in a large subset of the top k lowest energy structures were removed (see Structure Calculation details in the SI). Second, the average structure of the k lowest energy structures was calculated by Xplor-NIH and any possible assignments corresponding to carbon pairs further apart than a specific cutoff distance (16 Å down to 8.5 Å) in the average structure were ruled out. In doing so, we treated the average structure of the previous run as a low-resolution model of the protein structure and accordingly modified the distance restraints for the next iteration. We conducted a total of 12 iterations of Xplor-NIH, with home-written Python scripts used intermittently for the restraint input modification described above (see Xplor scripts in the SI). Each iteration started from an arbitrary, extended conformation of the protein sequence, with ideal covalent geometry. Starting from an initial cutoff distance of 16 Å for distance restraint filtration from the 25-structureaverage output of the first run, we gradually and iteratively decreased this filtration cutoff down to 8.5 Å (with k = 10), so that with each passing iteration, this filter on distance restraints was more strictly reliant on the outcome of the previous run. Revisions of the distance restraints list throughout the calculation process resulted in an iterative decrease to both their overall amount and to ambiguity levels (see Figure S3 in the SI). The final set of restraints included 112 torsion angle restraints and 1247 distance restraints, including 593 unambiguous restraints. Figure 2 displays several processed spectra, acquired with a variety of mixing times and pulse sequences, and illustrates cross-peaks that gave rise to unambiguous medium-and long-range restraints included in this final set.
Finally, the lowest energy structure of the 12th iteration was provided as an initial structure to a refinement process, conducted in implicit solvent. 63 Structure Validation. The average Cα-RMSD of the final 10-structure ensemble with respect to their average structure is 1.2 Å across all 87 residues (see Figure S4). The backbone RMSD of the well-defined regions of the ensemble (residues 2−16, 28−37, and 41−87, as determined by the PDB structure validation report; this mainly excludes the DNA-binding loop) is 0.37 Å, well below the value of 2.0 Å, often used as the preliminary criterion for structural convergence (e.g., by the structure calculation software CS-Rosetta 64 ).
In order to verify convergence, we used Xplor-NIH to conduct restraint violation analysis on the refined ensemble. Of the 1247 distance restraints and 112 angle restraints provided to the final refinement step, less than 2 and 7% of the distance and angle restraints, respectively, were violated (by more than 0.5 Å or 5°for distance and angle restraints, respectively) in some or all members of the ensemble (see SI Tables S6−S9). Therefore, the refined ensemble is in good agreement with the final set of input restraints, with the vast majority of internuclear distances and torsion angles within the bounds of the experimentally derived and iteratively filtered constraints. Statistics based on the Ramachandran plot 65 of the ψ− ϕ torsion angles of the ensemble indicated that none of the analyzed, well-defined residues (see Table S10 and Figure S5 in the SI) lie in the disallowed regions. This indicates that the majority of the protein backbone torsion angles are geometrically feasible in terms of sidechain steric hindrance. This finding is further corroborated by the MolProbity clash score, 66 depicting the average number of steric atomic clashes larger than 0.4 Å per 1000 atoms (see details in the SI). The calculated ensemble clash score (calculated for well-defined residues, as determined by the PDB structure validation report) is 26, which is on par with MolProbity clash scores previously reported for other protein structures solved via MAS ssNMR (CAP-Gly bound to microtubules�PDB ID: 2MPX 41 −24; Crh dimer�PDB ID: 2RLZ 40 −27; GB1�PDB ID: 2KQ4 67 −28). The detailed structural statistics of the final ensemble are provided in Table 1.
Characterization of the Calculated Ensemble. The 13step structure calculation process resulted in a final refined ensemble of 10 structures shown in Figure 3, which we report as our model for the gVp monomer in complex with fulllength, 8233-nucleotide-long ssDNA of fd phage. The structure was deposited to the Protein Data Bank with PDB ID 8ACZ. The Stride secondary structure determination algorithm 68  (Figure 4). The core loop is also well-defined, with a Cα-RMSD of 1.0 Å.
Upon alignment of the ensemble structures, it can be seen that the orientation of the DNA-binding loop with respect to the hydrophobic core varies considerably, resulting in a high Cα-RMSD value of 2.5 Å. This variability may also be attributed to structural flexibility of the DNA binding loop with respect to the core, as previously reported. 23 Such conformational plasticity may be beneficial for the binding of gVp dimers to ssDNA as part of the nucleoprotein complex assembly process. The high Cα-RMSD of this loop may also be explained by the long distance between the tip of the loop and the core regions of the protein, making it challenging to lock its orientation with respect to the rest of the protein using structural information composed of internuclear distances up to approximately 8 Å.
Comparison of the Bound gVp Model to Free gVp. In a visual comparison of the ensemble-averaged structure of ssDNA-bound gVp with that of the X-ray structure of free gVp ( Figure 5), it is apparent that the overall global fold is mostly conserved. The four main loops of free gVp (DNA, dyad, core, and complex) are all clearly visible in the bound form, even though some of their secondary structure elements, welldefined in the free form, are absent (described below). In addition, the beta sheet arrangement of the core of the monomer is mostly preserved. These results are in agreement with the assumption made during filtration of the initial set of distance restraints, that the tertiary structure is not changed altogether upon binding to phage ssDNA.
There are several significant structural changes detected throughout the sequence, reflected overall by a backbone (N, Co, Cα) RMSD of 6.4 Å between the ensemble-averaged structure of bound gVp and the X-ray structure of free gVp.   Such significant conformational changes accompanying the nucleoprotein complex assembly process are expected given the CSP analysis previously reported by our lab 36 and agree well with the results reported here, as will be shown in detail.
Relative to the X-ray structure, the location of the Cterminus undergoes a significant structural change ( Figure  5C,D). This final section of the sequence (residues 69−87) folds back toward an entirely different region of the protein. Some examples of unambiguous long-range contacts restricting this region (connecting carbon sites at the C-terminus region to carbon sites belonging to the hydrophobic core) are provided in the SI ( Figure S7 and S8, Tables S11 and S12), corroborating this aspect of the calculated structure. Also, this result agrees well with the findings from the previous CSP results. 36 The motional pathway for the transition between the free and bound gVp molecular conformations remains puzzling in terms of the change to the location of the C-terminus with respect to the core, and further research is required in order to decipher the trajectory of such a structural change.
Another major structural change relates to the concave clefts of each dimer structure, hypothesized to accommodate for the bound strands of the ssDNA. As mentioned, upon dimerization of two gVp monomers, two concave clefts are formed in the homodimer structure, each located between the dyad loop of one monomer and the DNA-binding loop of the other. 18 As can be seen in our ensemble-averaged bound gVp structure, the available space between the DNA-binding loop and the dyad loop of the monomer is significantly decreased upon binding ( Figures 5A,B and 6). This change may be described in terms of the proximity of both of these loops to the core loop. It is the decrease in both inter-loop distances (DNAbinding loop−core loop and dyad loop−core loop) that gives rise to a narrowing of the vacant volume between the loops that is available for the bound strands of nucleotides. Such a modification may facilitate the complexation of the nucleoprotein assembly. In order to quantify this change, we define an inter-loop distance as the average of all pairwise distances between all possible inter-loop pairs of Cα carbons. Using this definition, averaged across all 10 members of the ensemble (see details in Table S13 in the SI), we calculated that the dyad loop−core loop inter-loop distance decreases by 6.6 Å (from 25.2 to 18.6 Å) upon binding, and the DNA-binding loop− core loop inter-loop distance decreases by 4.4 Å (from 18.4 to 14.0 Å). Once again, this finding is corroborated by the fact that both the DNA-binding loop and the core loop are expected to undergo significant structural changes, according to previously reported CSP analysis. 36 While the results shown here indicate a narrowing of the cleft of a single monomer, if the twofold symmetry axis is preserved upon binding, then the observed approach of the DNA-binding loop and the dyad loop in an intra-monomer sense is expected to give rise to a shortening of two inter-monomer distances, each defined by the dyad loop of one monomer and the DNA-binding loop of the other monomer. This observation is significant, since it has long been reported that the binding of the dimer to the two anti-parallel strands of ssDNA involves the approach of two inter-monomer pairs of loops (DNA-binding loop and dyad loop) toward one another, with each such pair binding to one of the two strands forming the helical structure within the complex. 18 The free and bound gVp structures also differ in the recognition of secondary structure elements (Figure 7).
According to the Stride 68 algorithm, the two short 3 10 helices reported in the free X-ray structure are no longer detected in the bound structure. Ramachandran analysis comparing the free and bound forms indicates that this discrepancy arises due to slight shifts of the corresponding residues in ψ−ϕ space (see Figure S6). Of the five beta strands comprising the beta barrel in the free form, four are partially detected in the bound model. These strands, recognized at the region most central to the hydrophobic core, form a short anti-parallel beta sheet, which is a subset of the beta barrel motif reported in the free structure. The fifth strand that is not detected at all in the bound form is located at the C-terminus of the free structure, where our model exhibits a significant change in tertiary structure ( Figure 5C,D), and therefore, it is not surprising that this is accompanied by a change to the secondary structure. The DNA-binding loop protruding from the core includes the longest pair of anti-parallel beta-strands in the free gVp structure, and these strands, for the most part, are not detected in the bound form (only a short section of one of these strands, which is located at the core, at residues 43−46, is detected). It may be that the change in the orientation of the DNA-binding loop with respect to the core prevents the formation of hydrogen-bonded regions in both strands; therefore, the beta strands are no longer present in the bound form. While many  Journal of the American Chemical Society pubs.acs.org/JACS Article residues corresponding to the beta-strand regions of free gVp do undergo small shifts in ψ−ϕ space, several residues do exhibit significant changes (see Figure S6). Since some of the beta-strands of free gVp are short, the residues that exhibit larger changes to their torsion angles, due to the tertiary changes throughout the structure, suffice to prevent determination of several regions as beta-strands. The free form X-ray structure also includes a short antiparallel beta ladder at the dyad loop (residues 68−70 and 76− 78 in the free structure), which is not detected in the bound form by the Stride algorithm. In the bound structure, the conformation of both sections of the protein backbone with respect to one another is no longer viable for the formation of a set of hydrogen bonds necessary for formation of beta strands ( Figure 5B). Yet, the location of the dyad loop (residues 68− 78) with respect to the core is preserved overall.
Another interesting structural difference involves the alignment of the core loop with respect to the DNA-binding loop and the dyad loop ( Figure 8). In the free gVp structure, this loop protrudes outward from the hydrophobic core, in an entirely different direction compared to the two loops, but in the bound model, the core loop is aligned along the same vertical axis as the DNA-binding loop and the dyad loop. This change in conformation may possibly be involved in the binding mechanism of the protein to the DNA. That is, this alignment of all three loops may be a result of their binding interactions with the ssDNA. It is also possible that such a reordering of the loops in 3D space is the result of a transition of the monomer to a more compact conformation, as the homodimers are arranged adjacently for the formation of the protein coat of the complex.

■ CONCLUSIONS
We reported the first high-resolution 3D model for the structure of the gVp monomer in its conformation when bound to full-length ssDNA of the fd bacteriophage virus. Our model is directly based on restraints elucidated from experimental magic-angle spinning NMR data. The structure calculation protocol makes minimal use of the known free structure in order to improve the quality of the restraint input to the calculation of the bound form. It also takes measures in order to avoid conformational distortions that may arise from inter-monomer contacts of the homodimer.
This structure joins a relatively short list of non-crystalline protein structures that have been determined via MAS ssNMR in the context of their entire high-molecular-weight biological assembly, further demonstrating the power of this method in structural studies. As the use of AI methods for accurate determination of protein structure based solely on sequence is becoming prevalent, MAS ssNMR presents a tool to study specific structural changes involved in processes of DNA binding or complex assembly.
Results previously reported by our group have presented spectral evidence that gVp undergoes significant structural changes upon binding to ssDNA in the intracellular nucleoprotein complex. CSP analysis has detected regions of interest along the protein sequence, where such modifications to the conformation were expected to be most prominent. The structure reported here confirms the three main regions that were predicted to be most conformationally altered in the bound form: The DNA-binding loop protrudes from the core in a different angular conformation when compared to the free structure; the location of the C-terminus with respect to the hydrophobic core is changed, posing a large structural change with respect to the free protein: The core loop, which protrudes in a different direction from the DNA-binding loop and the dyad loop in the free form, is aligned with both loops in the bound form. These conformational changes seem to facilitate two processes: the binding to viral ssDNA, as the concave cleft where the ssDNA is hypothesized to be bound to the protein assumes a narrower conformation, and the cooperative binding interactions of dimers forming the capsid of the complex, as the entire protein assumes a more compact conformation.
Experimental parameters of NMR experiments; processing parameters of the spectral data; technical details on the generation of structural restraints; Xplor-NIH scripts; detailed results of the structural validation of the final ensemble. The Python scripts used for structure calculations available via a GitHub depository at https:// github.com/YoavShamir5/Atomic-Resolution-Structureof-the-Protein-Encoded-by-Gene-V-of-fd-Bacteriophagein-Complex-. The raw NMR data are available via Zenodo, https://dx.doi.org/10.5281.zenodo.7294474 (PDF) ■ AUTHOR INFORMATION Funding